Using Twitter data for demographic research

نویسندگان

  • Dilek Yildiz
  • Jo Munson
  • Agnese Vitali
  • Ramine Tinati
  • Jennifer A. Holland
  • Emilio Zagheni
چکیده

BACKGROUND Social media data is a promising source of social science data. However, deriving the demographic characteristics of users and dealing with the nonrandom, nonrepresentative populations from which they are drawn represent challenges for social scientists. OBJECTIVE Given the growing use of social media data in social science research, this paper asks two questions: 1) To what extent are findings obtained with social media data generalizable to broader populations, and 2) what is the best practice for estimating demographic information from Twitter data? METHODS Our analyses use information gathered from 979,992 geo-located Tweets sent by 22,356 unique users in South East England between 23 June and 4 July 2014. We estimate demographic characteristics of the Twitter users with the crowd-sourcing platform CrowdFlower and the image-recognition software Face++. To evaluate bias in the data, we run a series of log-linear models with offsets and calibrate the nonrepresentative sample of Twitter users with mid-year population estimates for South East England. 1 Wittgenstein Centre for Demography and Global Human Capital (IIASA, VID/ÖAW, WU), Austria. E-Mail: [email protected]. 2 University of Southampton, UK. 3 Erasmus Universiteit Rotterdam, the Netherlands. Yildiz et al.: Using Twitter data for demographic research 1478 http://www.demographic-research.org RESULTS CrowdFlower proves to be more accurate than Face++ for the measurement of age, whereas both tools are highly reliable for measuring the sex of Twitter users. The calibration exercise allows bias correction in the age-, sex-, and location-specific population counts obtained from the Twitter population by augmenting Twitter data with mid-year population estimates. CONTRIBUTION The paper proposes best practices for estimating Twitter users’ basic demographic characteristics and a calibration method to address the selection bias in the Twitter population, allowing researchers to generalize findings based on Twitter to the general population.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Twitter for Demographic and Social Science Research: Tools for Data Collection and Processing.

Despite recent and growing interest in using Twitter to examine human behavior and attitudes, there is still significant room for growth regarding the ability to leverage Twitter data for social science research. In particular, gleaning demographic information about Twitter users-a key component of much social science research-remains a challenge. This article develops an accurate and reliable ...

متن کامل

Who Tweets in the United Kingdom? Profiling the Twitter Population Using the British Social Attitudes Survey 2015

The headache any researcher faces while using Twitter data for social scientific analysis is that we do not know who tweets. In this article, we report on results from the British Social Attitudes Survey (BSA) 2015 on Twitter use. We focus on associations between using Twitter and three demographic characteristics—age, sex, and class (defined here as National Statistics SocioEconomic Classifica...

متن کامل

Demographic Breakdown of Twitter Users: An analysis based on names

We propose an approach for age estimation using solely people’s first names by extending an already existing method proposed by Chang et al. for ethnicity estimation. We demonstrate that proposed method is able to predict age of a person as well as the age breakdown of an entire population better than the natural alternatives. We then apply both the age and the ethnicity method to Twitter US us...

متن کامل

How Does Twitter User Behavior Vary Across Demographic Groups?

Demographically-tagged social media messages are a common source of data for computational social science. While these messages can indicate differences in beliefs and behaviors between demographic groups, we do not have a clear understanding of how different demographic groups use platforms such as Twitter. This paper presents a preliminary analysis of how groups’ differing behaviors may confo...

متن کامل

Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data

This paper specifies, designs and critically evaluates two tools for the automated identification of demographic data (age, occupation and social class) from the profile descriptions of Twitter users in the United Kingdom (UK). Meta-data data routinely collected through the Collaborative Social Media Observatory (COSMOS: http://www.cosmosproject.net/) relating to UK Twitter users is matched wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017